Vision and Language Integration: Moving beyond Objects
نویسندگان
چکیده
The last years have seen an explosion of work on the integration of vision and language data. New tasks like Image Captioning and Visual Questions Answering have been proposed and impressive results have been achieved. There is now a shared desire to gain an in-depth understanding of the strengths and weaknesses of those models. To this end, several datasets have been proposed to try and challenge the state-of-the-art. Those datasets, however, mostly focus on the interpretation of objects (as denoted by nouns in the corresponding captions). In this paper, we reuse a previously proposed methodology to evaluate the ability of current systems to move beyond objects and deal with attributes (as denoted by adjectives), actions (verbs), manner (adverbs) and spatial relations (prepositions). We show that the coarse representations given by current approaches are not informative enough to interpret attributes or actions, whilst spatial relations somewhat fare better, but only in attention models.
منابع مشابه
A Novel Method for Tracking Moving Objects using Block-Based Similarity
Extracting and tracking active objects are two major issues in surveillance and monitoring applications such as nuclear reactors, mine security, and traffic controllers. In this paper, a block-based similarity algorithm is proposed in order to detect and track objects in the successive frames. We define similarity and cost functions based on the features of the blocks, leading to less computati...
متن کاملExtending the Qualitative Trajectory Calculus Based on the Concept of Accessibility of Moving Objects in the Paths
Qualitative spatial representation and reasoning are among the important capabilities in intelligent geospatial information system development. Although a large contribution to the study of moving objects has been attributed to the quantitative use and analysis of data, such calculations are ineffective when there is little inaccurate data on position and geometry or when explicitly explaining ...
متن کاملView combination in moving objects: The role of motion in discriminating between novel views of similar and distinctive objects by humans and pigeons
Humans and pigeons were trained to discriminate between views of similar and distinctive objects that rotated in depth coherently or non-coherently. We tested novel views that were either moving or static and were either between the training viewpoints or beyond them. With both types of motion, both species recognized views between the training viewpoints better than views beyond this range. Ad...
متن کاملA PRACTICAL APPROACH TO REAL-TIME DYNAMIC BACKGROUND GENERATION BASED ON A TEMPORAL MEDIAN FILTER
In many computer vision applications, segmenting and extraction of moving objects in video sequences is an essential task. Background subtraction, by which each input image is subtracted from the reference image, has often been used for this purpose. In this paper, we offer a novel background-subtraction technique for real-time dynamic background generation using color images that are taken fro...
متن کاملA Navigation System for Autonomous Robot Operating in Unknown and Dynamic Environment: Escaping Algorithm
In this study, the problem of navigation in dynamic and unknown environment is investigated and a navigation method based on force field approach is suggested. It is assumed that the robot performs navigation in...
متن کامل